Skip to content
This repository has been archived by the owner on May 25, 2022. It is now read-only.

[journald_input] Write journald fields as attributes instead of body. #353

Closed

Conversation

gregoryfranklin
Copy link

Closes: open-telemetry/opentelemetry-collector-contrib#7298

The list of journald fields is defined here

The log data model suggests that the body should contain only the log message, other fields should be set as attributes.

This change applies the following field mappings:
MESSAGE => body
PRIORITY => severity, severity_text
__REALTIME_TIMESTAMP => timestamp

All other fields are stored as attributes.

The list of journald fields is defined here:
  https://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html

The following field mappings are applied:
MESSAGE              => body
__REALTIME_TIMESTAMP => timestamp
PRIORITY             => severity, severity_text

All other fields are stored as attributes.
@gregoryfranklin gregoryfranklin requested a review from a team January 21, 2022 16:33
@codecov
Copy link

codecov bot commented Jan 21, 2022

Codecov Report

Merging #353 (91ba7da) into main (962bfd8) will decrease coverage by 0.1%.
The diff coverage is 54.5%.

❗ Current head 91ba7da differs from pull request most recent head 87ae947. Consider uploading reports for the commit 87ae947 to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##            main    #353     +/-   ##
=======================================
- Coverage   77.1%   77.0%   -0.2%     
=======================================
  Files         94      94             
  Lines       4471    4492     +21     
=======================================
+ Hits        3451    3460      +9     
- Misses       701     707      +6     
- Partials     319     325      +6     
Impacted Files Coverage Δ
operator/builtin/input/journald/journald.go 52.7% <54.5%> (-0.1%) ⬇️
operator/builtin/input/tcp/tcp.go 78.0% <0.0%> (-1.7%) ⬇️

Comment on lines 244 to 248
str, ok := v.(string)
if !ok {
// attributes only supports strings at the moment
// https://github.com/open-telemetry/opentelemetry-log-collection/issues/190
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to convert common types to a string? I think all values are strings in the first place, but I think we should avoid silently dropping a field in the event it is not a string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, can do. The options were to wait for #190 to be fixed (I've not looked at what is needed for this) at which point we'll be able to do this properly or convert everything to a string.

Comment on lines 81 to 112
"_BOOT_ID": "c4fa36de06824d21835c05ff80c54468",
"_CAP_EFFECTIVE": "0",
"_TRANSPORT": "journal",
"_UID": "1000",
"_EXE": "/usr/lib/systemd/systemd",
"_AUDIT_LOGINUID": "1000",
"_PID": "13894",
"_CMDLINE": "/lib/systemd/systemd --user",
"_MACHINE_ID": "d777d00e7caf45fbadedceba3975520d",
"_SELINUX_CONTEXT": "unconfined\n",
"CODE_FUNC": "unit_log_success",
"SYSLOG_IDENTIFIER": "systemd",
"_HOSTNAME": "myhostname",
"MESSAGE_ID": "7ad2d189f7e94e70a38c781354912448",
"_SYSTEMD_CGROUP": "/user.slice/user-1000.slice/user@1000.service/init.scope",
"_SOURCE_REALTIME_TIMESTAMP": "1587047866229317",
"USER_UNIT": "run-docker-netns-4f76d707d45f.mount",
"SYSLOG_FACILITY": "3",
"_SYSTEMD_SLICE": "user-1000.slice",
"_AUDIT_SESSION": "286",
"CODE_FILE": "../src/core/unit.c",
"_SYSTEMD_USER_UNIT": "init.scope",
"_COMM": "systemd",
"USER_INVOCATION_ID": "88f7ca6bbf244dc8828fa901f9fe9be1",
"CODE_LINE": "5487",
"_SYSTEMD_INVOCATION_ID": "83f7fc7799064520b26eb6de1630429c",
"_GID": "1000",
"_SYSTEMD_UNIT": "user@1000.service",
"_SYSTEMD_USER_SLICE": "-.slice",
"__CURSOR": "s=b1e713b587ae4001a9ca482c4b12c005;i=1eed30;b=c4fa36de06824d21835c05ff80c54468;m=9f9d630205;t=5a369604ee333;x=16c2d4fd4fdb7c36",
"__MONOTONIC_TIMESTAMP": "685540311557",
"_SYSTEMD_OWNER_UID": "1000",
Copy link
Member

@dmitryax dmitryax Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these fields are not compliant with OTel naming guidelines. I would rather recommend to make this change in one leap to the attribute names that will be used going forward than just moving all the fields to attributes as is and changing them after that. This would require semantic conventions PR first.

Another concern I have is that I'm not sure we need all these fields. Many of them don't provide real value IMO. I think we can reduce number of attributes to essential ones. That's why I suggested to provide an option to not parse the journald payload in the issue, in case if user do want all of these fields as is .

cc @djaglowski

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these fields are not compliant with OTel naming guidelines. I would rather recommend to make this change in one leap to the attribute names that will be used going forward than just moving all the fields to attributes as is and changing them after that. This would require semantic conventions PR first.

Unfortunately, I think waiting for the semantic conventions blocks most of this PR for now, as the body will have to remain structured in order to hold most of the fields. I am on board with waiting though, as this is the best way to ensure we are only moving in the right direction.

Shall we:

  1. Have a small PR now that just parses out severity
  2. Make a PR to spec that proposes semantic conventions
  3. Circle back to this after 2 is accepted

@gregoryfranklin, are you on board with this? Are you willing to propose the semantic conventions?

Copy link
Member

@djaglowski djaglowski Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another concern I have is that I'm not sure we need all these fields. Many of them don't provide real value IMO. I think we can reduce number of attributes to essential ones. That's why I suggested to provide an option to not parse the journald payload in the issue, in case if user do want all of these fields as is .

One design option that comes to mind is something like the preset option used for severity parsing. Basically, we can have some predefined sets of values that will be parsed. One of these sets is the default. The user can optionally choose another set and/or add additional fields to any set.

# just the default set
- type: journald_input

# default set plus a few extra fields
- type: journald_input
  fields:
  - USER_INVOCATION_ID
  - _SYSTEMD_USER_UNIT

# alternate predefined set plus a few extra fields
- type: journald_input
  field_set: some_other_set
  fields:
  - USER_INVOCATION_ID
  - _SYSTEMD_USER_UNIT

# only a few specific fields
- type: journald_input
  field_set: none
  fields:
  - USER_INVOCATION_ID
  - _SYSTEMD_USER_UNIT

Copy link
Member

@dmitryax dmitryax Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we:

Have a small PR now that just parses out severity
Make a PR to spec that proposes semantic conventions
Circle back to this after 2 is accepted

The plan looks good to me. And I think 1 should be just an additive change - we can keep the payload and do not remove PRIORITY for now.

One design option that comes to mind is something like the preset option used for severity parsing. Basically, we can have some predefined sets of values that will be parsed. One of these sets is the default. The user can optionally choose another set and/or add additional fields to any set.

This sounds good. Maybe even just fields would be enough. But would it mean that we have to come up with attribute names (semantic conventions) for all existing journald fields?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe even just fields would be enough.

True. If you leave it unspecified you get all that we have semantic conventions for, otherwise you get what you specify.

But would it mean that we have to come up with attribute names (semantic conventions) for all existing journald fields?

I suppose we could make fields a map, where a the key is the journald field name, and the value is the name of the attribute. User can leave value empty if we have a default mapping for the field, otherwise must specify.

- type: journald_input
  fields:
    _BOOT_ID:            # has semconv => ok, defaults to semconv (e.g. "journald.boot.id")
    _COMM: "custom.comm" # no semconv, but user named it => ok
    _PID: "custom.pid"   # has semconv, but user overrode it => ok
    __CURSOR:            # no semconv => unmarshal error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach. I believe that not all the fields are going to be log attributes, some of them should be resource attributes, right? In that case this configuration model needs to be adjusted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense. We can finalize design during implementation, but something to this effect should work well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with waiting for semantic conventions to be added to the spec.

I've pushed a suggestion for what this could look like. Feedback is welcome.

Some of the fields will be resource attributes, so I've added a fields and resource_fields config item with a default value and users can override.

For the moment, anything that does not have a semantic convention defined gets passed through unmodified. I've added a commented out list of all the fields defined in the documentation.

Do we also want to specify a list of fields that we drop by default? There may be unknown fields, which I guess should be kept if they are unknown (passed through unmodified), so we should probably only drop known fields if they don't get assigned a semantic convention.

@djaglowski
Copy link
Member

djaglowski commented Apr 11, 2022

Closing this PR, as the codebase is being merged into collector contrib.

On the topic of the PR, it seems the conclusion is that this is the right direction, but it should wait for semantic conventions to be added to the spec before we add support for mapping journald fields to attributes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

journald - Consider parsing more known fields from logs
4 participants